Exact Discovery of Length-Range Motifs
نویسندگان
چکیده
Motif discovery is the problem of finding unknown patterns that appear frequently in real valued timeseries. Several approaches have been proposed to solve this problem with no a-priori knowledge of the timeseries or motif characteristics. MK algorithm is the de facto standard exact motif discovery algorithm but it can discover a single motif of a known length. In this paper, we argue that it is not trivial to extend this algorithm to handle multiple motifs of variable lengths when constraints of maximum overlap are to be satisfied which is the case in many real world applications. The paper proposes an extension of the MK algorithm called MK++ to handle these conditions. We compare this extensions with a recently proposed approximate solution and show that it is not only guaranteed to find the exact top pair-motifs but that it is also faster. The proposed algorithm is then applied to several real-world time series.
منابع مشابه
Efficient Discovery of Variable-length Time Series Motifs with Large Length Range in Million Scale Time Series
Detecting repeated variable-length patterns, also called variable-length motifs, has received a great amount of attention in recent years. Current state-of-the-art algorithm utilizes fixed-length motif discovery algorithm as a subroutine to enumerate variable-length motifs. As a result, it may take hours or days to execute when enumeration range is large. In this work, we introduce an approxima...
متن کاملEfficient Discovery of Time Series Motifs with Large Length Range in Million Scale Time Series
Detecting repeated variable-length patterns, also called variable-length motifs, has received a great amount of attention in recent years. Current state-of-the-art algorithm utilizes fixed-length motif discovery algorithm as a subroutine to enumerate variable-length motifs. As a result, it may take hours or days to execute when enumeration range is large. In this work, we introduce an approxima...
متن کاملEncoded Expansion: An Efficient Algorithm to Discover Identical String Motifs
A major task in computational biology is the discovery of short recurring string patterns known as motifs. Most of the schemes to discover motifs are either stochastic or combinatorial in nature. Stochastic approaches do not guarantee finding the correct motifs, while the combinatorial schemes tend to have an exponential time complexity with respect to motif length. To alleviate the cost, the c...
متن کاملFast and Accurate Discovery of Degenerate Linear Motifs in Protein Sequences
Linear motifs mediate a wide variety of cellular functions, which makes their characterization in protein sequences crucial to understanding cellular systems. However, the short length and degenerate nature of linear motifs make their discovery a difficult problem. Here, we introduce MotifHound, an algorithm particularly suited for the discovery of small and degenerate linear motifs. MotifHound...
متن کاملEfficient Discovery of Common Patterns in Sequences
Finding motifs or repeated patterns in data is of wide scientific interest [11, 8, 5, 6]. For example, elucidating motifs in DNA sequences is a critical first step in understanding biological processes as basic as the RNA transcription. There, the motifs can be used to identify promoters, the regions in DNA that facilitate the transcription. Finding motifs can be equally crucial for analyzing i...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014